Search Result

Select

Patent text classification based on ALBERT and bidirectional gated recurrent unit

WEN Chaodong, ZENG Cheng, REN Junwei, ZHANG Yan

Journal of Computer Applications 2021, 41 (2): 407-412. DOI: 10.11772/j.issn.1001-9081.2020050730

Abstract （646）

PDF （979KB）（775）

Save

With the rapid increase in the number of patent applications, the demand for automatic classification of patent text is increasing. Most of the existing patent text classification algorithms utilize methods such as Word2vec and Global Vectors (GloVe) to obtain the word vector representation of the text, while a lot of word position information is abandoned and the complete semantics of the text cannot be expressed. In order to solve these problems, a multilevel patent text classification model named ALBERT-BiGRU was proposed by combining ALBERT (A Lite BERT) and BiGRU (Bidirectional Gated Recurrent Unit). In this model, dynamic word vector pre-trained by ALBERT was used to replace the static word vector trained by traditional methods like Word2vec, so as to improve the representation ability of the word vector. Then, the BiGRU neural network model was used for training, which preserved the semantic association between long-distance words in the patent text to the greatest extent. In the effective verification on the patent text dataset published by State Information Center, compared with Word2vec-BiGRU and GloVe-BiGRU, the accuracy of ALBERT-BiGRU was increased by 9.1 percentage points and 10.9 percentage points respectively at the department level of patent text, and was increased by 9.5 percentage points and 11.2 percentage points respectively at the big class level. Experimental results show that ALBERT-BiGRU can effectively improve the classification effect of patent texts of different levels.

Reference | Related Articles | Metrics

Select

Malicious webpage integrated detection method based on Stacking ensemble algorithm

PIAOYANG Heran, REN Junling

Journal of Computer Applications 2019, 39 (4): 1081-1088. DOI: 10.11772/j.issn.1001-9081.2018091926

Abstract （441）

PDF （1165KB）（279）

Save

Aiming at the problems of excessive cost of resource, long detection period and low classification effect of mainstream malicious webpage detection technology, a Stacking-based malicious webpage integrated detection method was proposed, with heterogeneous classifiers integration method applying to malicious webpage detection and recognition. By extracting and analyzing the relevant factors of webpage features, and performing classification and ensemble learning, the detection model was obtained. In the detection model, the primary classifiers were constructed based on K-Nearest Neighbors (KNN) algorithm, logistic regression algorithm and decision tree algorithm respectively, and Support Vector Machine (SVM) classifier was used for the construction of secondary classifier. Compared with the traditional malicious webpage detection methods, the proposed method improves the recognition accuracy by 0.7% and obtains a high accuracy of 98.12% in the condition of low resource consumption and high velocity. The experimental results show that the detection model constructed by the proposed method can recognize malicious webpages efficiently and accurately.

Reference | Related Articles | Metrics

Select

Automatic generation of test data for extended finite state machine models based on Tabu search algorithm

REN Jun ZHAO Rui-lian LI Zheng

Journal of Computer Applications 2011, 31 (09): 2440-2443. DOI: 10.3724/SP.J.1087.2011.02440

Abstract （1417）

PDF （746KB）（480）

Save

Test case generation of EFSM (Extended Finite State Machine Models) includes test path generation and test data generation. However, nowadays most research into EFSM testing focuses on test path generation. In order to explore the automatic test generation, a test data generation method oriented to the path of EFSM models was proposed. A Tabu Search (TS) strategy was adopted to automatically generate test data, and the key factors that affect the performance of test data generation in EFSM models were analyzed. Moreover, the test generation efficiency was compared with that of Genetic Algorithm (GA). The experimental results show that the proposed method is promising and effective, and it is obviously superior to the GA in the test generation for EFSM models.